-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rule-based fragmentation #17
Open
jackgisby
wants to merge
126
commits into
dev
Choose a base branch
from
feat-rule_based_fragmenter
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
jackgisby
commented
Jul 14, 2022
- Add rule-based fragmentation
- Restructure modules
- Switch to conda-incubator for setting up Actions CI
- Remove unused dependencies
…ate mc, exact_mass, MSn masses
…or the correlated query
Codecov Report
@@ Coverage Diff @@
## dev #17 +/- ##
==========================================
- Coverage 94.86% 94.62% -0.25%
==========================================
Files 7 8 +1
Lines 955 1190 +235
==========================================
+ Hits 906 1126 +220
- Misses 49 64 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
jackgisby
added a commit
to jackgisby/metaboblend
that referenced
this pull request
Nov 7, 2022
* Compatibility with conda version of geng; remove geng tool from package * Incorporate pkl files into connectivity database * Add nauty as dependency * Add pickle as test dependency * Switch from strings to pickles for connectivity graphs * Use blob instead of text to store pickled dictionary * No longer write substructures to .smi * Add option to build to select only frequent substructures * Add connectivity filter to k_configs * Incorporate connectivity filter into MSn build method * Build substructures for each set of masses independently * Call itertools.product on substructures within multiprocessing portion of build * Configure run script for current create_isomorphism_database inputs * Built subsets should be empty list, not None * Update variable names, remove debug options, update docstrings * Add annotate_msn and generate_structures user functions * Move stage at which multiprocessing step is performed * Allow for multiple output options in build * Remove ppm option for retrieving elemental composition from substructure db * Allow list of mc/exact_mass to be passed to generate_structures * Use TemporaryDirectory to store unittest results * Let generate_structures return/yield smiles * Implement build_msn to incorporate considerations for building structures from MS/MS * Implement annotate_msn to provide an interface to build_msn * Add/update build docstrings * Remove unnecessary build parameters * Pass data dictionary to user-facing build functions rather than separate mc, exact_mass, MSn masses * Update variable naming conventions * Add newline between smiles in out file * Update SubstructureDb for removal of .pkl files * Add function create_substructure_database * Bring tests up to date with variable renaming * Bring scripts up to date with variable renaming * Simplify loading of test data and remove teardown * Remove unused class ConnectivityDb and update SubstructureDb parameters * Implement additional non-msn build tests * Improve temporary table cleaning logic * Fix issues with new build functions * Allow tests to load auxiliary test data * Implement msn tests and update k_config test for new parameter * Correctly specify ppm in generate_structures * Minor docstring and code reformatting * Add binder dir * Add example notebook * Remove scripts * Implement basic notebook * Add small substructures to database prior to msn annotation * Complete notebook example * Fix logic for when smi_out_dir is None * Rename example_msms.ipynb to workflow.ipynb * Add pip to install metaboblend * Add data dir, remove databases dir, move test data to data dir * Write notebook databases to notebook_data * Unzip test data * Simplify test paths * Remove databases from gitignore * Use test databases for notebook * Implement simple hydrogenation rules * Get bond types rather than number of available atoms for hydrogen rule calculations * Don't count dummy atoms for bond type calculations * Remove dummy atom mass * Use max_degree of 6 and 2 available_atoms by default for create_substructure_database * Account for the fact we use neutral peaks (i.e. have removed adduct ion) * Modify hydrogen re-arrangement rules for doulbe bonds * Update databases tests * Implement test for calculate_possible_hydrogenations using reference numbers * Add test for calculate_hydrogen_rearrangements * Update hydrogen re-arrangement calculation function documentation * Update remaining unit tests * Add hydrogen re-arrangement compound HMDB XMLs * Record even substructures * Record even substructures in results DB * Add indexes to improve combine_ecs function performance * Improve results DB hierarchy and implement aggregation of scoring metrics * Define SQLite functions to calculate scores via queries alone * Record max BDE in spectra results table * Calculate frequency in the absence of scores (for non-MSn method) * Retain substructures does not cause substructures not to be initially recorded * Add additional scoring metrics * Update results db test data * Define ppm error and valence of fragment prior to re-ordering * Configure checks on recording of putative structure information * Calculate scores at substructure combination level * Convert True to 1 and False to 0 for conversion to SQLite boolean type * Index results DB * Use a loop in place of pool.map * Minor performance improvements * Merge minor performance improvements * Use the minimum absolute error for getting possible fragment ions * Add separate absolute error options for MSn peak and full structure * Use 0.005 for abs_error_precursor * Drop indexes before inserting into results DB * Add results table index on ms_id_num and structure_smiles * Update results DB tests * Add table for generating unique structure smiles IDs * Calculate cosine spectrum similarity * Allow for the specification of weights for the results database scoring calculations * Aggregate structure scores but force floating point division * Select fragment and substructure id when calculating results scores for the correlated query * Update results DB tests with updated scores * Don't create indexes until structure scoring * Don't include valence=0 substructures in the substructure database * Add max BDE parameter for building * Remove redundant connectivity graphs * Update data to test filter records function * Update dictionary pickle with Python 3.7 * Update file header * Update contact information * Update setup.py * Update tests for RDKit changes * Update README * Keep functioning buttons * Update testing workflow * Use python 3.7 * Remove unused dependencies * Use only the channel conda-forge * Add pillow and pyqt dependencies * Remove list definition in function arguments * Add algorithms test * Merge database tests into single file * Restructure modules * Restructure tests * Update outdated imports * Omit notebooks from coverage Co-authored-by: Ralf Weber <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.